On the estimation of normal copula discrete regression models 
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Abstract 

The continuous extension of a discrete random variable is amongst the computational 
methods used for estimation of multivariate normal copula-based models with discrete 
margins. Its advantage is that the likelihood can be derived conveniently under the the- 
ory for copula models with continuous margins, but there has not been a clear analysis 
of the adequacy of this method. We investigate the asymptotic and small-sample effi- 
ciency of two variants of the method for estimating the multivariate normal copula with 
univariate binary, Poisson, and negative binomial regressions, and show that they lead 
to biased estimates for the latent correlations, and the univariate marginal parameters 
that are not regression coefficients. We implement a maximum simulated likelihood 
method, which is based on evaluating the multidimensional integrals of the likelihood 
with randomized quasi Monte Carlo methods. Asymptotic and small-sample efficiency 
calculations show that our method is nearly as efficient as maximum likelihood for fully 
specified multivariate normal copula-based models. An illustrative example is given to 
show the use of our simulated likelihood method. 

Keywords: Continuous extension; Jitters; Multivariate normal copula; Rectangle prob- 
abilities; Simulated likelihood. 



1 Introduction 



For multivariate discrete data Y = {Yi,...,Yfi) given a vector of (continuous or dis- 
crete) covariates x = (xi,...,Xrf) with Xj S M^, j = l,...,d, the discretized multivari- 
ate normal (MVN) distribution, or the MVN copula w ith discrete margins, has been 



in use for a considerable lengt h of t ime, e.g. iJod (119971). and much earlier in the bio- 



statistics (Ashford a.nd Sowdenl . Il97d ). psychometrics ( MutherJ . Il978l ). and econometrics 



([Hausman and Wisd . Il978l ) literature. It is usually known as a multivariate, or multino- 
mial, probit model. The multivariate probit model is a simple example of the MVN copula 
with univariate probit regressions as the marginals. In the general case, the discretized 
MVN model has the following cumulative distribution function (cdf): 



Pr(yi < yi,. .. ,yd < yrf;x) = $4$-i[Fy,(yi; XI )],... ,$-n^yd(yd;xd)];R) 



:i) 



where ^d denotes the standard MVN distribution function with correlation matrix R 
{Pjk : I < j < k < d), ^ is the cdf of the univariate standard normal, and i<Vi(yi;xi), . . 
FY^{yd',^d) are the univariate discrete cdfs. 

Other mu ltivariate copulas for discrete response data have been around a long time, e.g. 



m 



Jog ( 19971 ). and earlier for some simple copula models. Simple parametric families of 
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copulas have a closed form cdf; hence the joint likelihood is straightforward to derive from 
the probability mass functi on (pmf ) as a finite differenc e of the cdf, but they p rovide l i mited 
dependence. For example. iMeester and MacKavl ( 19941 ) used a Frank copula ( Frankl . Il979l ) 
with a closed form cdf to model multivariate binary data. The Frank copula is a member 
of the Archimedean class of copulas, which are limited to exchangeable structures. 

The MVN copula generated by the MVN distribution inherits the useful properties of 
the latter, thus allowing a wide range for dependence, and overcom es the drawback of lim^ 



ited dependence inherent in simple parametric families of copulas (JNikoloulopoulos et al. 



20 111 ). The use of the MVN with logistic regression (or Poisson or negative binomial re- 
gression) is just a special case of the general theory of dependence modelling with cop- 
ulas. Implementation of the MVN copula for discrete data (discretized MVN) is possi- 
ble, but not easy, because the MVN distribution as a latent model for discrete response 
requires rectangle probabilities based on h igh-dimensional integrations or their approxi- 
mations ( Nikoloulopoulos and Karlia . l2009l ). Many appro aches have been c onsidered fo r 
com puti n g hig h-dimensional normal probabilities, see e.g. ISchervishI ( 19841 ). iGena ( 19921 ). 
and IJod ( 19951 ). These could be used to evaluate the normal copula-based likelihood for 
discrete data with gene ral dependence. 

F our recent papers (JHeinen and Rengifd . 120071 . l2008l : iMadsenl . l2009l : iMadsen and FaneJ . 
20 111 ) attempt to "approximate" the like lihood, by using the continu o us extension (CE) of 
a discrete r andom variable developed in iDenuit and LambertI ( 20051 ). iHeinen and Rengifo 
( 20071 . I2OO8I ) proposed a surrogate likelihood, assum ing the latent u nifor m variables in the 
CE of a discrete random variable are observed, while iMadsenI ( 20091 ) and lMadsen and Fang 
( 2OIII ) used also the CE but their method is actually an example of a simulated likelihood 
to compute the MVN rectangle probabilities. The methods based on the CE cannot be rec- 
ommended unt il its properties have been s tudied and compared to existing methods. The 
CE method in iDenuit and LambertI ( 20051 ) has been used to prove theoretical results for 
copula-based concordance measures for discrete data. Although its application to copula 
dependence modeling for discrete data is novel, its theoretical and small-sample efficiency 
has yet to be established in that context. The contribution in this paper is (a) to exam- 
ine thoroughly the accuracy and the adequacy of the surrogate and simulated likelihood 
method based on the CE using asymptotics and simulations; (b) t o imp r ove th e efficiency 
of simulated likelihood by transforming the rectangle integrals as in lCena ( 19921 ): and (c) to 
give precise guidelines for handling the MVN copula-based likelihood for regression models 
with dependent discrete response data. 

For ease of exposition, we consider the case that the univariate marginal parameters are 
common to different univariate margins; 7 denotes the r-dimensional vector of univariate 
marginal parameters that are not regression coefficients, and (3 the p-dimensional vector of 
the regression parameters. The marginal means, E{Yj) = fJ,j, j = 1, . . . , d, depend on the 
vector of the covariates Xj, via the vector f3 and a link function, r]{fJ-j) = xT/3, j = 1, . . . ,d. 
The remainder of the paper proceeds as follows. Section 2 provides an overview of the sur- 
rogate and simulated likelihood method using the CE. In Section 3 we describe appropriate 
methods for handling the MVN copula-based likelihood for regression models with depen- 
dent discrete response data, and propose a maximum simulated likelihood method based 
on evaluating the multidimensional integrals of the likelihood with randomized quasi Monte 
Carlo methods. Section 4 and Section 5 contain theoretical (e.g. asymptotic properties of 
the estimators) and small sample efficiency calculations, respectively, to assess the accuracy 
of the methods. Section 6 p resents an applicat i on of the proposed simulated likelihood to 
the toenail infection data in lMadsen and Fang) ( 201ll ). We conclude this article with some 
discussion, followed by a technical Appendix. 



2 Surrogate and simulated likelihood using the CE 



Denuit and Lambertl ( 20051 ) proposed a CE of integer- valued random variables to study 



concordance measures for dependent discrete data. They associate an integer- valued random 
variable Y with a jittered random variable Y*, such that, 

Y* = Y + {V-1), 

where ^ is a uniform random variable in the unit interval and independent of Y. Y* is 
a CE of Y, and has support on the reals. Given a particular realization of the jittering, 
the (conditional) density of Y* is /y*|v(y*|v) = fyily*]), where \y*~\ is the ceiling of 
y*, or the smallest integer greater than or equal to y*. The (conditional) cdf of Y* is 
FY*\viy*\v) = Fydy*] — 1) + vfY{\y*~\)- It is easy to see that the simplified cdf and density 
of y* take the form Fy*|y(y*|ti) = -FV(y — l)+'y/y(y) and /y*|y(y*l^) = fviv)-, respectively. 

2.1 Surrogate likelihood based on the CE of a discrete random variable 

Heinen and Rengifd ( 20071 . 120081 ) were the first that adopt this approach to form a surrogate 



likelihood for MVN copula-based models with discrete margins (hereafter HR method). 
Copula-based models were originally developed for continuous respo nses where the density 



i s obt ained using partial derivatives of the multivariate cdf (see e.g. iNikoloulopoulos et al.l 



( 20121 )). and hence the numerical calculations are much simpler. The corresponding density 



for the jittered continuous vector Y* = (Y,*, . . . , Y^) given n independent standard uniforms 

v = (yi,...,yrf)is, 

d 

^Y*|v(y*|v;x) = cyFY*\v^{yV\Vi;:x.i), . . . ,Fy*|y^(y^|7;d;xrf);RJ JJ fY*\v^{y*\vj]yLj), 

j=i 

where v = (i;i, . . . , Vd) are realizations of the jitters V and c(wi, . . . , Ud] R) is the d-variate 
normal copula density. Since the MVN copula has a closed form density, 

where q = (gi, . . . , g^) with qj = ^~^{uj),j = 1, . . . , d and I^ is the d-dimensional identity 
matrix, the authors avoid the multidimensional integration by using the CE of the discrete 
random variables. The estimated parameters can be obtained by maximizing the surrogate 
log-likelihood, 

n 

^hr{(3, 7, R) = ^ log /iY*|v(yii, • • • , Vidlvii, ■■■: Vid] xji, . . . , Xid), 
i=i 

over the univariate and copula parameters (/3,7,R). In order to avoid the noise introduced 
by the jitters V, they use m jitters. That is, they simulate a vector of independent standard 
uniforms V^ = (Vi fci • • • j ^ fe) ^^^ maximize, 



^HRi^^ 7, R) = X] ^°s hY*\Vk{yii, ■■■, yid\vii,k, • • • , Vid,k; xji, • • • , x^^), 

for k = 1, . . . ,m. The HR estimates are the average of the estimates over the m runs. 



2.2 Simulated likelihood based on the CE of a discrete random variable 



MadserJ (J2009l ) and lMadsen and Fand (J201ll ) also adopt the CE of 1^ proposed bv lDenuit and Lambert 
(|2005l ). but they make use of importance functions in combination with simulated likelihood. 
Simulated likelihood for multivariate probit models has been used in econometrics since the 
early 1990s, and it is one of the recomrn ended methods for generalized linear mixed models, 
see e.g. Chapter 7 in iDemidenkd ( 20041 ) . The simulated likelihood method, applies impor- 
tance sampling to simulate the likelihood function, but this must be done in a way that the 
simulated likelihood changes little when the parameters of the model are perturbed a little. 
This can be accomplished by using the same set of random draws in an appropriate way. 
For importance sampling to work well, the integrand, say. 



e{z) dz 






g{z) dz = Ez 



eiz 



9{z) 



must be converted to a good form of an empirical average based on data simulated from 
a suitable importance sampling distribution g. Better choices of g have small variance for 
w{z) = e{z)/g{z), such that the importance weight w is bounded. A bad choice can lead to 
an inflated variance of the integral-estimate and thus to a very poor approximation. Hence, 
in order to use simulated likelihood in an efficient way the choice of the the importance 
sampling distribution is crucial. 

The simulated likelihood using the jitters (hereafter MF method) is, 






,yid\vii, 



; "^idy ^il; • • • 5 ^id) 



(2) 



i=l 



Essentially, the MF importance weight is unbounded. iMadsen and Fang! (120 111 ) write an 
exponential with quadratic form 2{q (Id — R~^)ci}) which is not positive definite, so for 
part of the q vector space, it can be arbitrarily negative; hence their integrand can be 
arbitrarily large. This is a poor choice of an importance function, and it is hard to achieve 
good accuracy for the numerical integral. 

The expected likelihood in ([2]) can be approximated by averaging over the jitters V^, k = 
1, . . . ,m, that is, 

rn n 

Lmf{I3, 7> R-) = — X] n ^Y*|Vfc iVii, ■■■, yid\vii,k, ■■■, Vid,k;^ii, ■■■ , ^id) 

^ k=l i=l 

d 



1 

m 



m I n 



;^E n ii^r"'^-p 



n d 



1 1 

-{q,^,(I,-R-l)q,,fe} HfYfwM. 

J j=i 

^ n a m / n f r ^ -|>.' 

=-nn/>'.(y^^'^^'^)E n ii^r'^'^^p -^{<k{^<^-'^~^)^^.^] i > (3) 

%=\j=\ k=\ \i=\ ^ L -I ^ / 

where qj,A.- = ($~M^yi*|yi,fe(?/jibii,fc;Xii)], . . . , $~^[Fy*|y^ ,^.(2/j*dl^irf,fc;^id)])- 

The MF estimates of (/3,7,R) are derived by maximizing the Imf = log-/^A/F with 
respect to the univariate and copula parameters. For the MF (and HR in subsection 12. ip 
likelihood, to work well in a numerical optimization routine, the evaluations via simulation 
have to be smooth (differentiable) when the parameters change by small amounts. In order 
to accomplish this, the same set of uniform random var iables should be used n o ma tter the 



parameter values in the iterative optimization; see e.g. iBhat and SidharthanI (120111 ). 



3 Appropriate methods for handling the likeUhood 

Estimation of the model parameters (/3, 7, R) can be approached by the stan dard maximum 
hkehhood (ML) method, by maximizing the joint log-likehhood (jJod . 119971 ). 



£(/3,7,R) 



i=l 



log/iY(y; 



ii,. 



I yid'i Xjl) 



,Xid) 



(4) 



over the univariate and copula parameters (/3,7, R), wh ere h 



is th e joint pmf of the 



multivariate discrete response ve c tor Y " = ( Yi , . . . , Yh) . ISone (120071) . influencing oth er 
authors (e.g. iHeinen and Rendfol (|200il2008i l: ]Madsenl (|2009l ): iMadsen and Fan3 |20n|)), 
acknowledged that the pmf can be obtained as a finite difference of the cdf in ([T]). Generally 
speaking, this is an imprecise statement, since calculating the finite difference among 2 
numerically computed orthant probabilities may result in negative values. The pmf can be 
alternatively obtained by computing the following rectangle probability, 



/iY(y;x) = Pr(Yi = yi,...,yd = yd;x) 

= Pr(yi - KYi < yi, . . . ,yd - KYd < yd;x) 

*-i[i^yi(yi;xi)] i-'l>-^[FY^(ya;ycd)] 

■■■ <pR{zi,...,Zd)dzi . 



(5) 



• • dzd, 



where (pn denotes the standard MVN density with latent correlation matrix R. 

The computation of MVN rectangle probabilities such as the one in ([5]) is possible, 
but requires multidimensio nal integration. However, there is a s pecial case overlooked in 
the biostat i stics l iterature (jKiefeii . Il982l : iMadsen and Fana . I2OIII : lOchi and Prenticd . Il984l : 
Song et al.l . |2009| ): for positive exchangeable correlation structures, the d- d imens ional in 



I972I . p. 48). 



tegrals conveniently reduce to 1-dimensional integrals ([Johnson and Kota . 
Hence, MVN rectangle probabilities can be quickly computed to a desired accuracy that 
is 10~^ or less, because 1-dimensional numerical integrals are computationally easier than 
higher-dimensional numerical integrals. For general correlation structures, there are several 
papers in the literature that focus on the computation of the MVN rectangle probabilities, 
and, convenien tly, the irnplemen tation of the proposed algorithms is available in contributed 
R packages [j. ISchervishI (jl984l ) proposed a locally adaptive numerical integration method 
but this m ethod, while more accurate, is time consuming and restr i cted t o a low dimension. 
Therefore, iGena (jl992l ) proposed a Monte Carlo method and Ijod ()1995l ) proposed two ap- 
proximations to multivariate normal probabilities. The first-order approximation makes use 
of all of the univariate and bivariate marginal probabilities, and the second-order approxi- 
mation also makes use of trivariate and four-variate marginal probabilities. These advances 
in computation of MVN probabilities can be used to implement MVN copula models with 
discrete response data: 

• For positive exchangeable dependence structures, if one computes the rectangle MVN 
probabilities in ([!]) with the 1-dimensional integral method in iJohnson and Kotz 
(|l972l ). then one is using a numerically accurate likelihood method that is valid for 
any dimension. 



If one computes the rectangle MVN probabilities in (j3|) with the methods i n iJoe 
( I995I ). then one is using an approximate likelihood method; see e.g. Ijod ( 19971 ). 



^Both approximations to MVN r e ctang le in I Joel (|l995l ). the 1-dimensional integral in the exchangeable 



case, and the method in ISchervishI ((lOSJ), can b e comput e d wit h the functions m vnapp, exchmvn, and 
pmnorm, respectively, in the R package mprobit (|joe et all l201lh. The methods in iGena (11993) can be 



computed with the function pmvnorm in the R package mvtnorm (jGenz et all 12012 ) 



If one com putes the re ctangle MVN probabilities in ^ via simulation based on the 
method in iGenj (jl992l ). then o ne is usi n g the simu lated likelihood m e thod but with 
something much better than in iMadsenI ( 2003 ) and lMadsen and Fangi ( 201ll ). 



Since both approximations in I Jod (|l995l ) are better when the correlations are smaller, we 
concentrate on the siniulate d likelihood method based on the computation of the MVN 
probabilities ala lGena ( 19921 ) . 



3.1 Simul ated likelihood method via the optimized method of lGenz and Bretz 
(l2002h 



For an integral in the context of an MVN rectangle probability, 



P{aj < Zj < bj,j = 1, ...d) 



ai 



bd 



bn{zi,...,Zd)dzi ...dzd, 



(6) 



o-d 



for an MVN density (pn with correlation matrix R, iGena (jl992l ) uses a sequence of thee 
transformations to transform the original integral into an integral over a unit hypercube. 



P{aj<Z,<bjJ = l,...d)= [ ■■■ [ 

Jo Jo 



e{vi,... ,Vd-i)dvi ...dvd-i, 



with. 



e{vi,...,Vd-i) = eie2ivi)e3{vi,V2)...edivi,...,Vd-i), 
ej{vi,...,Vj-i) = ej{vi,...,Vj-i) -ej{vi,...,Vj-i), 

i-i 



ej{vi,. 


■ > Vj-l) = 


V 


aj 


->; 

fc=l 


Ejivi,. 


■,Vj-l) = 


V 


)r 


J-1 



^Cjfc$ Hefc(^'i)--->^fc-i) + «fcefc(t'i,---,Vfc-i)} /& 



^33 1 



^Cjk^ ^{ek{vi,...,Vk~i) +Ukek{vi,...,Vk-i)} /cjA; 



k=l 

C = {cjk : 1 < j < k < d) is the matrix used for the Cholesky decomposition of R. 

This sequence of transformations reduces the number of integration variables by one, 
but, more interestingly, the rectangle integral is converted to a bounded integrand, so that 
the rectangle probability can be successfully evaluated via importance sampling based on a 
(d — l)-variate standard uniform random sample V^, 



P{aj < Zj < bj,j = 1, ...d) = m ^ ^ e(vfc) 

fc=i 



Genz and Breta ()2002l ) improve the performance of the crude Monte Carlo methods in 



Gena (jl992l ) by calling a randomized quasi Monte Carlo method with the use of antithetic 
variates. They use approximations of the form, 

m ^ P 

P(a, <Z, <6,-,j = l,...(i)=7n-^j; — ^(e(|2Lpp + VfcJ-l|) + e(l-|2Lpp + VfcJ-l|). 

fc=i p=i 

(7) 
In this form, [tj denotes the vector obtained by taking the fractional part of each of the 
components of t , and Pp,p = 1, . . . , P i s a set of quasi-random points. 

To sum up, iGenz and Breta ()2002l ) achieve error reduction of Monte Carlo methods 
with variance reduction methods as (a) transforming to a bounded integrand, (b) using 



6 



a ntithetic variates^ and ( c ) usin g a randomized quasi Monte Carlo method. The test results 
in lGenz and Breta ( 20021 . |2009| ) show that their method is very e fficient, compared to ot her 
methods in the literature. Note in pa ssing that the me thod in iGenz and Breta (J2002l ) is 



"optimized" in the mtvnorm R package (|Genz et al.l . |2012| ). Hence, on the calculation of the 



approximation in ([7]), one doesn't need to worry about the selection, for example, of the 
number of jitters m, or the number of quasi points P. 

We implement a simulated likelihood (he reafter SL), where the rectangle MVN proba- 
bilities are computed based on the method in lGenz and Breta (l2002l ). Since the estimation 
of th e parameter s of the MVN copula-based models is obtained using a quasi-Newton rou- 
tine ( Nashl . I1990I ) applied to the log- likelihood in (j4]), the use of randomized quasi Monte 
Carlo simulation to four decimal place accuracy for evaluations of integrals works poorly, 
because numerical derivatives of the log-likelihood with respect to the parameters are not 
smooth. In order to achieve smoothness, the same set of uniform random variables should 
be used for every rectangle probability that comes up in the optimization of the SL. Hence, 
our implementation allows estimation of parameters from response vectors of dime nsion 
much l a.rger than t h ree th at used in previous theoretical studies and applications (e.g. ISong 
(|2007l ): ISong et all toO^h ). 



4 Theoretical efficiency 



In this section, we perform several theoretical calculations, similarly to iJod (119951. l2008l) . 
to iii v estiga t e the accuracy of the "approximat e " like lihood methods (JHeinen and Rengifd . 
2OO7I . I2OO8I : iMadsenl . I2OO9I : iMadsen and Fand . l201lh . and the SL method in Subsection 
3.1, which is based on evaluating: the multidimensional integrals at the likelihood with the 
method in lGenz and Breta (|2002l ). 



4.1 Asymptotics 

In this subsection, we study the asymptotics of the HR and SL methods, and we assess the 
accuracy based on the limit (as the number of clusters increases to infinity) of the maximum 
surrogate likelihood estimate (HRMLE) and the maximum SL estimate (MSLE) . By varying 
factors such as dimension d, regression and not regression parameters, the amount of dis- 
creteness (binary versus count response), and latent correlation for exchangeable structures, 
we demonstrate patterns in the asymptotic bias of the HRMLE and MSLE, and assess the 
performance of HR and SL. For the cases where we compute the probability limit, we will 
take a constant dimension d that increases. We will also conveniently use discrete covariates 
so that we can assume that there are a finite number of distinct values. The idea of the 
continuous extension is to replace a numerically more difficult MVN rectangle probability 
calculation with a simpler MVN density value, and hence it is discrete responses that matter 
and not the type of covariates. The pattern should be similar with continuous covariates 
but the bias cannot be determined as easily. 

Let the T distinct cases for the discrete response and the covariates be denoted as 



(y«,x«),...,(y(^),x(^)). 



where y 



(t) 



iy\ 



it) 



'Vd 



(i)^ ^(t) 



(xl' 



{<) 



x|, ), t = 1, . . . , r. In a random sample of size 



n, let the corresponding frequencies be denoted as n^^' , . . . , n^^' . Assuming a probability 
distribution on the covariates, for t = 1, . . . ,T, let p^^' be the limit in probability of vy"' jn 



as n — )• oo. For the simulated likelihood in 

T 



n 



, we have the limit, 
i^(/3,7,p)^5^p(*)log/iY(yP,...,y?;xP,...,xJ)), 



(8) 



t=i 



where /iY(y^* , x^*') is computed using the method in lGenz and Breta ( 2002I ). The limit of 
the MSLE (as n — ?■ oo) is the maximum of ([8]); we denote this limit as (/3 , 7 , p ). Note 
in passing that the limit of the standard MLE (as n — )• 00) is the max imum of (El) where 
Iv^JjA^) . -jj-(i) j ^g computed with the 1-dimensional integral method in Ijohnson and Kotz 
(I1972I I. 

The surrogate log-likelihood based on the MVN density with exchangeable dependence 
structure is, 

n d 

^HR{f3, 7, p) = ^ log \^c{uii ,...,Uid;p)'[{ hj iVij ; ^ij) 



where c{uii, .. . ,Uid]p) 



i=l 

1 



j=i 



20091 ^ with Qij = $' 



^ll+(d--l)p]il-p)d- 



^e 2(i-p)li+Vi)pl (('^~^^^^i=i4"^^i<fcg'j'?'fc) I 



Zezulal . 



[u, 



ij) 



<I> ^[FYj{yij — l;'x.ij)+VijfYj{yij',^ij)] as realizations of standard 



P 



normal random variables. Therefore?! ^^///j(/9,7,p) is, 

n d n n d 

(d - i)p X] IZ 4 ~'^^Y1 ^^J^*^ + ^~^ JZ Z] ^°s Iy, {yij]^ij)- 

i=l j=l i=l j<k i=l j=l 

Then as ?! — )• 00, the limit in probability of n~^£HR{(3,J,p) is. 



t=i 



.^log[l + {d-l)p]-^log{l-p) 



2{l-p)[l + {d-l)p] 



(9) 



where C ;C, > J = l,---,d, t = 1,...,T are conditional expectations for the truncated 
normal distribution that have closed forms. Further details are given in the Appendix. 
The limit of the HRMLE (as n — )■ 00) is the maximum of Q; we denote this limit as 

We will compute these limiting HRMLE and MSLE in a variety of situations to show 
clearly if the HR and SL methods are good. By using these limits, we do not need Monte 
Carlo simulations for comparisons, and we can quickly vary parameter values and see the 
effects. The p^*) in ([8]) and Q are the mode l based probabilities /ixlY '-, ^ ), and computed 
with the 1-dimensional integral method in I Johnson and Kota ( 19721 ). For marginal models 
we use Bernoulli (/i), Poisson(p), and negative binomial (N B). For the latter model, w e 
use both the NBl(/i, 7) and NB2(/i, 7) parametr i zation in ICameron and Trivedil ( 19981 ): 
the NB2 parametrization is that used in iLawlesa ( 19871 ). For a count response, we get a 
finite number of y'*^ vectors by truncation. The truncation point is chosen to exceed 0.999 
for total probabilities. Further, we use only one binary covariate, which is the same for 
each cluster; this scenario is typical for longitudinal data with time-independent covariates. 
Other discrete (and time-dependent) covariates can be used, but computations are more 
time consuming because T is larger. 
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0.6 
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0.8 
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-0.467 
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0.467 



Table 1: Limiting HRMLE for MVN copula-based models with marginal logistic regression. 



Representative results are shown in Tables [T] and [2] for logistic and NB2 regression, with 
MSLE results omitted because they were identical with MLE up to three or four decimal 
places. Therefore, the SL method leads to unbiased estimating equations. Regarding the 
HR method, by varying the latent correlation p and dimension d, results are similar for 
the binary and count responses. There is substantial asymptotic downward bias for the 
HRMLE of the latent correlation (p^^), and it seems that there is negligible asymptotic 
bias for the HRMLE for the parameters that are regression coefficients (/3q , /3[^ ); note 
that this slightly increases as either d or p increases. Calculating the limit for HRMLE of 7 
(7^^) for NB2 regression, it can also be seen that there is substantial asymptotic downward 
bias for the univariate marginal parameters that are not regression coefficients as the latent 
correlation p increases. The results in Tables [1] and [2] show that the HR method is adequate 
with regard to the univariate marginal parameters that are regression coefficients. 



d 


P 


pHR 


/3o 


^r 


/3i 


pHH 


7 


^HH 


2 


0.3 


0.191 


-0.5 


-0.498 


0.5 


0.495 


0.5 


0.480 


2 


0.6 


0.397 


-0.5 


-0.492 


0.5 


0.483 


0.5 


0.410 


2 


0.8 


0.550 


-0.5 


-0.481 


0.5 


0.466 


0.5 


0.302 


3 


0.3 


0.191 


-0.5 


-0.497 


0.5 


0.492 


0.5 


0.468 


3 


0.6 


0.394 


-0.5 


-0.484 


0.5 


0.472 


0.5 


0.361 


3 


0.8 


0.545 


-0.5 


-0.466 


0.5 


0.446 


0.5 


0.214 



Table 2: Limiting HRMLE for MVN copula-based models with marginal NB2 regression. 
The truncation point is 10. 



After evaluating the adequacy of the MF and SL log-likelihood on finding the peak 
(MLE) , we evaluate if the curvature (Hessian) is also correct for the cases where the HRMLE 
and MSLE are correct. To check this, we also computed the negative inverse Hessian H 
of the limit of the surrogate log-likelihood in ([9]) and the simulated log- likelihood in (l8|); 
because these are limits as n —t- 00 of n~^ times the log-likelihood, H is the inverse Fisher 
information, or equivalently, the covariance matrix for sample size n is approximately n~^H. 
For a comparison, we have also calculated the Hessian at the limit for the standard MLE. 
For simpler comparisons, we convert to standard errors (SE), say for a sample size of 
n = 100 (that is, square roots of the diagonals of the above matrices divided by n). Some 
representative results are given in Tableland Tabled] for an MVN copula-based model with 
marginal logistic and NB2 regression, respectively, with the MSLE results omitted because 
they were again identical with MLE up to three or four decimal places. 

The results in Tables [3] and H] show that the HR method slightly underestimates the SE 
for the regression parameters. Underestimation of the curvature increases as the dimension 
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d 


p 


/3o 


Pi 


ML 


HR 


ML 


HR 


ML 


HR 










SE(/3o) 


SE(/3i) 


SE(p) 


2 


0.3 


-0.5 


0.5 


0.16 


0.15 


0.22 


0.21 


0.11 


0.07 


2 


0.6 


-0.5 


0.5 


0.17 


0.16 


0.24 


0.22 


0.08 


0.06 


2 


0.8 


-0.5 


0.5 


0.18 


0.16 


0.26 


0.22 


0.05 


0.06 


5 


0.3 


-0.5 


0.5 


0.12 


0.10 


0.17 


0.14 


0.05 


0.03 


5 


0.6 


-0.5 


0.5 


0.15 


0.11 


0.21 


0.16 


0.05 


0.03 


5 


0.8 


-0.5 


0.5 


0.17 


0.12 


0.23 


0.17 


0.03 


0.03 


10 


0.3 


-0.5 


0.5 


0.11 


0.08 


0.15 


0.11 


0.03 


0.02 


10 


0.6 


-0.5 


0.5 


0.14 


0.09 


0.19 


0.12 


0.04 


0.02 


10 


0.8 


-0.5 


0.5 


0.16 


0.09 


0.22 


0.13 


0.03 


0.02 



Table 3: Standard errors (SE) of the limiting HRMLE and MLE for MVN copula-based 
models with marginal logistic regression. 



d 


P 


A 


/3i 


7 


ML 


HR 


ML 


HR 


ML 


HR 


ML HR 












SE(/3o) 


SE(/3i) 


SE(7) 


SE(p) 


2 


0.3 


-0.5 


0.5 


0.5 


0.11 


0.11 


0.15 


0.14 


0.15 


0.14 


0.08 0.07 


2 


0.6 


-0.5 


0.5 


0.5 


0.13 


0.11 


0.16 


0.15 


0.15 


0.13 


0.06 0.06 


2 


0.8 


-0.5 


0.5 


0.5 


0.13 


0.12 


0.17 


0.15 


0.17 


0.12 


0.04 0.04 


3 


0.3 


-0.5 


0.5 


0.5 


0.10 


0.09 


0.13 


0.12 


0.12 


0.11 


0.06 0.04 


3 


0.6 


-0.5 


0.5 


0.5 


0.12 


0.10 


0.15 


0.13 


0.13 


0.10 


0.05 0.04 


3 


0.8 


-0.5 


0.5 


0.5 


0.13 


0.10 


0.16 


0.13 


0.15 


0.09 


0.03 0.03 



Table 4: Standard errors (SE) of the limiting HRMLE and MLE for MVN copula-based 
models with marginal NB2 regression. The truncation point is 10. 



d and/or the latent correlation p increases. The HR method leads to underestimation of 
the SE, because it is using information on jitters that are not in the observed data. 

To sum up, the asymptotics show that the maximum SL method is as good as maximum 
likelihood, while the maximum surrogate log-likelihood using the CE leads to approximate 
asymptotic unbiasedness for (some) univariate marginal parameters and not for the latent 
correlation parameters, because the jittering is univariate and does not account for depen- 
dence. Although we show the details only for exchangeable dependence, we expect the 
above results to hold in general, as well as to apply to different dependence structures. 
Section 5 contains small sample efficiency calculations using both exchangeable and AR(1) 
dependence. 

Closing this section, we explain why the the surrogate log-likelihood using the CE fails. 
It is easiest notationally to indicate what is happening with bivariate discretized normal: 

Yl = jl,Y2 = J2 iff Zl,ji~l < Zi< Zijj, Z2J2~1 < Z2< Z2J2 

where zij,Z2j are cutpoints, and {Zi,Z2) is bivariate standard normal with correlation p. 
Then conditioned on {Yi = ji,Y2 = J2}, (^1,^2) is dependent with a truncated bivariate 
normal distribution. This means the jittered variables (Vl,T^) have to be dependent to 
get the correct conditional distribution. So, for jittering to be asymptotically unbiased, a 
sequential approach would be needed based on the previous estimates of the latent correla- 
tions. 
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4.2 Computation of MVN rectangle probabilities 

In this section, we de scribe how the high- dimensional multivariate normal rectangle proba- 
bility is computed by iMadsenI (J2009l ) and iMadsen and Fang! (J201ll ) using i mportance sam- 
pling, and compare it with the naive simulation method, and the method in lGenz and Bretz 

Jiooi). 

For an integral in the context of an MVN rectangle probability in ([6]) , naive simulation 



uses. 



•m 



^X]-'-("j < ^i < ^J'-^ = l,-",rf), 



where l(^) denotes the indicator function of the set A and zi,...,Zd are m iid variates from 
(/)R. Importance sampling gives. 






[(t>nizi, ..., zd)/g{zi, . . . , zd)\g{zi, . . . , Zd)dzi . . . dzd, 

'0,1 J ad 

where g is a closed-form density from which it is easy to simulate. 
Estimation is via, 

m'^^^cpnizi, . . . , Zd)/g{zi, . . . ,Zd), 

where zi,...,Zd are iid simulated from g. Better choices of g have small variance for w = 
4>R./g, such that w is bounded (in which case one can bound the variance). The penalty 
for a bad g can be longer run times than for a general Monte Carlo simulation without 
im portance sampl ing. 

MadsenI ( 20091 ) and lMadsen and Fand ( 20 111 ) implement the integration in ([6]) by trans- 



forming zi = $ ^c^i), ...,Zd = ^ H^d) to get, 

rHbi) ^^^ rHbd) c/>R(cD"i(a;i),...,$-i(a;rf)) 



(<^-i(a;i))---0($-iM) 



duji . . . diOd, 



where (j) is the standard normal density. The iDenuit and LambertI ()2005l ) uniform extension 
in this case corresponds to evaluating the above integral based on a d-variate uniform 
random sample flk = {^i,k, ■ ■ ■ , ^d,k), where i^j^k, j = 1, ■ ■ ■ ,d are uniform in the interval 
$(aj) to ^{bj) for j = 1, . . . ,d. Approximation is via, 



m 



m 
k=l 



</.($-iK,fc))---<A(^"n^rf,fc)) 



where w^ = {oJi^k-, ■ ■ ■ ,^d,k) are realizations of the jitters fl^. 

In Table (U comparisons of the accuracy of the MF (m = 10^ and m = W^), the 
naive {m = 10^) method, and the method in iGenz and Breta ( 20021 ) are presented. The 
accuracy comparisons are for the computation of the equicorrelated rectangle probability 
of the form Pr(— a < Zj < a,j = l,...,d) for dimensions d = 5,10,20 and correlations 
p = 0.3, 0.6, 0.8. These probabili ties are also computed with the numerically accurate 

p to three or four decimal 
For these MVN 



20021 ^ 



method in I Johnson and Kot J (119721 ). and the res ults are identical u 
places with the ones found by the method in iGenz and Bretzj ( 
rectangle probabilities, we simulate based on the MF method, keeping track of the values 
of w{-) when simulating from the d-variate uniform random sample. Based on the sample 
variance of w{-), we estimate the achieved accuracy at the number of replications m, that 
is, SD= y^Var[w{-)]/m < accuracy. This calculation is simple for the naive method. 
In Table m it is clear that the MF method (even with m = 10^) gets worse as 

1. the dimension d increases; 
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d a 


P 


GB 


MF 
m - 


SD 

= 10^ 


MF 

m = 104 


SD 


Naive 


SD 


5 1 


0.3 


0.176 


0.176 


0.001 


0.176 


< 10^^ 


0.176 


0.004 




0.6 


0.266 


0.266 


0.005 


0.267 


0.001 


0.267 


0.004 




0.8 


0.391 


0.382 


0.014 


0.395 


0.005 


0.393 


0.005 


2 


0.3 


0.808 


0.823 


0.023 


0.809 


0.006 


0.809 


0.004 




0.6 


0.847 


0.863 


0.064 


0.850 


0.016 


0.847 


0.004 




0.8 


0.883 


0.808 


0.109 


0.862 


0.033 


0.883 


0.003 


4 


0.3 


1.000 


1.049 


0.055 


0.996 


0.012 


1.000 


< 10^3 




0.6 


1.000 


1.025 


0.133 


0.981 


0.032 


1.000 


< 10^3 




0.8 


1.000 


0.805 


0.112 


0.915 


0.044 


1.000 


<10-3 


10 1 


0.3 


0.038 


0.037 


< 10^^ 


0.038 


< 10^^ 


0.039 


0.002 




0.6 


0.110 


0.107 


0.003 


0.111 


0.001 


0.113 


0.003 




0.8 


0.267 


0.244 


0.020 


0.270 


0.007 


0.270 


0.004 


2 


0.3 


0.674 


0.641 


0.040 


0.677 


0.010 


0.675 


0.005 




0.6 


0.768 


0.741 


0.192 


0.753 


0.033 


0.763 


0.004 




0.8 


0.840 


0.647 


0.293 


0.659 


0.063 


0.837 


0.004 


4 


0.3 


0.999 


0.912 


0.111 


0.972 


0.024 


1.000 


< 10-3 




0.6 


0.999 


1.001 


0.420 


0.890 


0.057 


1.000 


< 10-3 




0.8 


1.000 


0.621 


0.289 


0.624 


0.069 


1.000 


< 10-3 


20 1 


0.3 


0.002 


0.002 


< 10^^ 


0.002 


< 10^^ 


0.002 


< 10-3 




0.6 


0.024 


0.023 


0.001 


0.025 


< 10^3 


0.025 


0.002 




0.8 


0.156 


0.121 


0.024 


0.168 


0.010 


0.159 


0.004 


2 


0.3 


0.493 


0.422 


0.016 


0.498 


0.019 


0.496 


0.005 




0.6 


0.670 


0.354 


0.064 


0.633 


0.047 


0.671 


0.005 




0.8 


0.792 


0.478 


0.451 


0.673 


0.283 


0.792 


0.004 


4 


0.3 


0.999 


0.672 


0.036 


0.959 


0.091 


0.998 


< 10-3 




0.6 


0.999 


0.347 


0.107 


0.748 


0.073 


0.999 


< 10-3 




0.8 


0.999 


0.514 


0.503 


0.699 


0.341 


0.999 


< 10-3 



Table 5: Comparisons of the accuracy of the MF, the naive method, and the method 
in Genz and Bretz (GB, 2002) for the equicorrelated rectangle probabihty of the form 
Pr(— a < Zj < a,j = 1, . . . ,d) for dimensions d = 5, 10, 20 and correlations p = 0.3, 0.6, 0.8. 
The computed probabilities with the GB method are identical u p to three or four decima l 
places with the ones found by the numerically accurate method in I Johnson and Kota ( 19721 ). 



2. the latent correlation p increases; 

3. the limits (integrated region) increase; 

and even the naive method is much better; it has 2 c lecimal place accu r acy fo r m 
replications. The use of jitters with m = 103, as in 



10^ 



Madsen and Fand ( 201ll ). is highly 



inefficient even in a low dimension. Therefore, the MF method is a very inefficient way to 
compute a multivariate normal rectangle probability, which means that even m = 10^ is far 
from sufficient to achieve a desired accuracy. It is also shown to be difficult to even estimate 
the SD, because the integrand has no bound. 

Regarding equation ([3]), this is much worse than approximating many d-dimensional 
MVN rectangle probabilities separately, that is, separate jitters of each d-dimensional prob- 
ability. This is actually a worse way to do simulated likelihood compared with what we 
mention above. 
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5 Simulations 

In this section we performed several simulation studies to assess the performance of the 
HR, MF, and SL methods. We used structured latent correlation matrices for the MVN 
copula. For exchangeable dependence, we took R as {1 — p)I(i + p^d, where I^ is the identity 
matrix of order d and J^ is the d x d matrix of Is. For AR(1) dependence, R was taken as 

For marginal models we used Bernoulli(/i), Poisson(/i), NB1(;U, 7), and NB2(/l(, 7) par- 
ametrization of the negative binomial distribution. For the covariates and regression pa- 
rameters, we chose p = '2,:x.j = {l,Xj)~^ , where Xj drawn from a U[—l, 1], and let p, depend 
on the covariates, that is "qipj) = (/3o + PiXj), j = 1, . . . ,d, where /3i = — /3o = 0.5. For the 
link function r/, we took the log link function for Poisson and NB regression, and the logit 
link function or the probit link function for binary regression. Note also that binary and 
Poisson regression 7 is null, while for NBl and NB2 regression 7 is scalar (r = 1). 

5.1 Assessing the variability due only to jittering 

As a first step, we assess the variability of the HR and MF estimates over different sets of 
jitters Vfc, k = 1, . . . ,m. Our goal is to define the value of m for which different sets of 
jitters for the same data set reproduce the same results. To this end, we fixed one simulated 
dataset and used 5 sets of random jitters V^, k = 1, . . . ,m,; the number of jitters m was 
set at 7TT, = 10, 10^, 10^, 10^. The following experiments are typical of the results that were 
obtained. 

We fixed one simulated data set with n = 100 observations from the bivariate normal 
copula with exchangeable moderate dependence {p = 0.5) and marginal logistic regression. 
Table [6] shows the variability of the HR and MF estimators over 5 different sets of jitters as 
compared with ML estimates and standard errors; for d = 2 there is a numerical accurate 
calculation of the bivariate normal rectangle probabilities in ([3]). To investigate if these 
results hold for higher dimensions, we also fixed one simulated data set with n = 100 
observations from the MVN (d = 5) copula with strong AR(1) dependence {p = 0.8), and 
NB2 regression with large over dispersion 7 = 2. Table [7] shows the variability of the HR 
and MF estimators over 5 different sets of jitters as compared with maximum SL estimates 
and standard errors. These results and other computations that we have done with latent 
correlation p varying from to 0.9 in 0.1 increments confirm that for the HR (MF) method, 
771 = 100 {m = 1000) is sufficient to produce the same estimates over different sets of jitters. 
Note that the SEs of the ML and maximum SL estimates are obtained via the square roots of 
the diagonals of the inverse Hessian computed numerically during the maximization process. 

5.2 Small-sample efficiency 

In this subsection, we gauge the small-sample efficiency of the HR, MF, and SL meth- 
ods. The number of jitters was set at m = 10,10^,10^,10^, and the sample size at 
n = 100, 300, 500. The following experiments are typical of the results that were obtained. 
We report simulations from small sample sizes (n = 100), since the additional simulation 
results do not show sensitivity to n. Note that the theoretical variances of the maximum 
SL and MF estimates are obtained via the gradients and the Hessian computed numerically 
during the maximization process, while the HR variances are obtained by averaging the 
latter over m runs. 

For n = 100, 10^ random samples of size n were generated from the bivariate normal 
copula with exchangeable moderate dependence {p = 0.5), and marginal logistic regression. 
Table El contains the parameter values, the bias, variance (Var), and mean square errors 
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MF 


(m = 


10) 


HR 


{m = 


10) 


set 


/3o 


h 


P 


/3o 


/3i 


P 


1 


-0.51 


0.40 


0.20 


-0.51 


0.40 


0.15 


2 


-0.54 


0.44 


0.33 


-0.51 


0.42 


0.18 


3 


-0.52 


0.41 


0.27 


-0.51 


0.41 


0.16 


4 


-0.50 


0.45 


0.27 


-0.51 


0.42 


0.16 


5 


-0.51 


0.41 


0.25 


-0.51 


0.41 


0.21 




MF 


{m = 


10^) 


HR 


(m = 


10^) 


set 


/3o 


/5i 


P 


/3o 


/3i 


P 


1 


-0.52 


0.33 


0.39 


-0.51 


0.41 


0.16 


2 


-0.52 


0.45 


0.32 


-0.51 


0.42 


0.18 


3 


-0.52 


0.47 


0.39 


-0.51 


0.42 


0.18 


4 


-0.52 


0.41 


0.33 


-0.51 


0.42 


0.17 


5 


-0.49 


0.46 


0.30 


-0.51 


0.42 


0.17 




MF 


{m = 


10^) 


HR 


(m = 


10^) 


set 


/3o 


/5i 


P 


/3o 


/3i 


P 


1 


-0.52 


0.44 


0.35 


-0.51 


0.42 


0.17 


2 


-0.51 


0.45 


0.38 


-0.51 


0.42 


0.17 


3 


-0.51 


0.45 


0.36 


-0.51 


0.42 


0.17 


4 


-0.49 


0.49 


0.42 


-0.51 


0.42 


0.17 


5 


-0.50 


0.47 


0.32 


-0.51 


0.42 


0.17 




MF 


(m = 


10^) 


HR 


(m = 


10^) 


set 


/3o 


/3i 


P 


/3o 


/3i 


P 


1 


-0.51 


0.45 


0.36 


-0.51 


0.42 


0.17 


2 


-0.52 


0.46 


0.38 


-0.51 


0.42 


0.17 


3 


-0.51 


0.45 


0.38 


-0.51 


0.42 


0.17 


4 


-0.50 


0.46 


0.37 


-0.51 


0.42 


0.17 


5 


-0.51 


0.45 


0.34 


-0.51 


0.42 


0.17 


ML 


/3o 


SE 


di 


SE 


P 


SE 




-0.51 


0.03 


0.47 


0.07 


0.43 


0.02 



Table 6: Variability of the MF and HR estimates with different sets of jitters V^, k = 
l,...,7n, along with ML estimates and standard errors (SE) for a fixed simulated data 
set with n = 100 observations from the bivariate normal copula model with moder- 
ate dependence [p = 0.5) and logistic regression. The number of jitters was set at 
m = 10, 10^10^10^. 



(MSE) of the MF, HR and ML (for d = 2 there is a numerical accurate calculation of the 
bivariate probabilities) estimates, along with the average of their theoretical variances (V). 
For n = 100, 10 random samples of size n were generated from the MVN (d = 5) copula 
with strong AR(1) dependence (p = 0.8) and NB2 regression with large over dispersion 
7 = 2. Table [9] contains the parameter values, the bias, variance (Var) and mean square 
errors (MSE) of the MF, HR and maximum SL estimates, along with the average of their 
theoretical variances (V). Because HR and MF likelihood takes much longer with larger 
m and d, we ran fewer replications on a subset for the 5-dimensional case. With an Intel 
Core Duo 2x 2.53Ghz processor, the computing times in an R program for SL, MF and HR 
with 771 = 10^ jitters, averaged about 1, 6, and 6 minutes respectively; the time is about 1 
hour for HR and MF with m = 10^ jitters. To this end, simulations have been restricted to 
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MF (m 


= 10) 






HR (m 


= 10) 




set 


/3o 


/3i 


7 


P 


/3o 


Pi 


7 


P 


1 


-0.44 


0.49 


1.14 


0.50 


-0.42 


0.48 


1.14 


0.50 


2 


-0.39 


0.49 


1.13 


0.53 


-0.42 


0.46 


1.14 


0.50 


3 


-0.45 


0.53 


1.09 


0.50 


-0.43 


0.50 


1.11 


0.48 


4 


-0.43 


0.42 


1.16 


0.53 


-0.42 


0.48 


1.15 


0.50 


5 


-0.39 


0.56 


1.08 


0.52 


-0.42 


0.50 


1.14 


0.49 




MF (m 


= 10^) 




HR (m 


= 10^) 




set 


/5o 


/5i 


7 


P 


/3o 


Pi 


7 


P 


1 


-0.37 


0.54 


1.13 


0.58 


-0.42 


0.49 


1.13 


0.49 


2 


-0.40 


0.47 


1.20 


0.57 


-0.42 


0.49 


1.12 


0.50 


3 


-0.41 


0.47 


1.17 


0.56 


-0.42 


0.48 


1.12 


0.49 


4 


-0.42 


0.50 


1.11 


0.53 


-0.42 


0.49 


1.13 


0.49 


5 


-0.43 


0.51 


1.17 


0.55 


-0.42 


0.48 


1.13 


0.49 




MF (m 


= 10^) 




HR (m 


= 10^) 




set 


/3o 


/5i 


7 


P 


Po 


Pi 


7 


P 


1 


-0.38 


0.48 


1.13 


0.58 


-0.42 


0.48 


1.13 


0.49 


2 


-0.38 


0.46 


1.24 


0.61 


-0.42 


0.48 


1.13 


0.49 


3 


-0.37 


0.48 


1.16 


0.58 


-0.42 


0.48 


1.13 


0.49 


4 


-0.39 


0.48 


1.20 


0.57 


-0.42 


0.49 


1.12 


0.49 


5 


-0.41 


0.49 


1.16 


0.56 


-0.42 


0.48 


1.13 


0.49 


SL 


/3o 


SE 


di 


SE 


7 


SE 


P 


SE 




-0.37 


0.15 


0.49 


0.09 


1.96 


0.40 


0.80 


0.03 



Table 7: Variability of the MF and HR estimates with different sets of jitters V^, k = 
1, . . . , 772, along with maximum SL estimates and standard errors (SE) for a fixed simulated 
data set with n = 100 observations from the MVN {d = 5) copula model with strong 
dependence {p = 0.8) and NB2 regression. The number of jitters was set at m = 10, 10^, 10^. 



m = 10'^. 

Conclusions from the values in the tables and other computations that we have done 
are the following: 

1. The SL method is highly efficient according to the simulated biases and variances. 

2. The HR and MF methods yield estimates that are almost as good as the ML and 
maximum SL estimates for the regression parameters. 

3. The HR and MF methods underestimate the univariate marginal parameters that are 
not regression coefficients as the latent correlation increases. 

4. The efficiency of HR and MF methods is low for the latent correlation, for a wide range 
of p values. The dependence parameters (latent correlations) are underestimated. 

5. For the MF method, it appears that there are improvements with m = 10^, so one 
might wonder if m = 10^ would be better. This number of jitters is prohibitive with 
respect to the computational time as the dimension increases. Further, theoretically, 
there are still problems; see subsection 14.21 

6. Overall, efficiencies of HR do not change considerably, when m changes. 

7. The variances for the MF and HR estimates are underestimated for the non-regression 
parameters, the intercept, and the regression parameters of discrete covariates. 
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Method 



m 



nBias nVar nMSE nV 



MF 



HR 



ML 



/3o 



-0.5 



10 
10^ 
10^ 
10^ 
10 
102 
10^ 
10^ 



-0.56 
-0.54 
-0.58 
-0.63 
-0.60 
-0.60 
-0.60 
-0.60 
-0.73 



3.02 
3.02 
3.03 
3.02 
3.00 
2.99 
2.99 
2.99 
3.00 



MF 



HR 



ML 



(3i 



0.5 



10 
102 
10^ 
10^ 
10 

102 

10^ 
10^ 



0.70 
0.76 
0.83 
0.82 
0.70 
0.70 
0.70 
0.70 
0.85 



6.54 
6.49 
6.45 
6.41 
6.57 
6.57 
6.56 
6.56 
6.31 



MF 



HR 



ML 



p = 0.5 



10 
102 
10^ 
10^ 
10 

102 

10^ 
10^ 



-20.93 
-15.17 
-11.19 
-8.30 
-30.42 
-30.40 
-30.39 
-30.39 
-0.73 



0.77 
0.74 
0.78 
0.87 
0.42 
0.37 
0.36 
0.36 
1.84 



3.02 
3.02 
3.03 
3.03 
3.00 
3.00 
2.99 
2.99 
3.00 



6.54 
6.50 
6.45 
6.42 
6.58 
6.57 
6.57 
6.57 
6.31 



5.15 
3.05 
2.03 
1.56 
9.67 
9.61 
9.59 
9.59 
1.84 



2.61 
2.69 

2.75 
2.80 
2.48 
2.48 
2.48 
2.47 
2.92 



6.56 
6.48 
6.41 
6.35 
6.67 
6.67 
6.67 
6.67 
6.15 



0.96 
1.00 
1.06 
1.12 
0.90 
0.90 
0.90 
0.90 
1.82 



Table 8: Small sample of sizes n = 100 simulations (10^ replications) and resultant biases 
and mean square errors (MSE) and variances (Var), along with average theoretical variances 
scaled by n, for the MF, HR, and ML of the regression and copula parameters for the 
bivariate normal copula model with moderate dependence and logistic regression. The 
number of jitters was set at m = 10, 102, 10^, 10^. 



8. The variances for the MF and HR estimates are overestimated for the regression 
parameters of continuous covariates that do not vary a lot. 



These conclusions justify whv iMadsenI ( 20091 ) and iMadsen and Fand ( 20 111 ) do not report 
latent correlations to their applications and studies of efficiency. 



6 The toenail infection data 



In this section we re-analyze the toenail infection data in lMadsen and FaneJ ()201ll ). The data 
were obtained from a randomized, double-blind, parallel group, multicent er study for the 
comparison of two oral treatments for t oenail dermatophyte onychomycosis ( De Backer et al.l . 
19961 : iMolenberghs and Verbekd . l2005l ) . Subjects were followed during 12 weeks (3 months) 



of treatment and followed further, up to a total of 48 weeks (12 months). Measurements 
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Method m nBias nVar nMSE nV 

MF To -5.94 2A8 2^84 1.10 

10^ -5.89 2.49 2.83 1.15 

10^ -5.82 2.50 2.84 1.19 

HR /3o = -0.5 10 -5.88 2.43 2.78 1.03 

10^ -5.89 2.43 2.77 1.04 

10^ -5.88 2.43 2.77 1.04 

SL -1.36 2.33 2.35 2.28 

MF 10 -0.72 L09 LIO 1.32 

102 -0.55 1.07 1.08 1.28 

10^ -0.35 1.05 1.05 1.26 

HR /3i = 0.5 10 -1.06 1.03 1.04 1.39 

10^ -1.07 1.02 1.03 1.39 

10^ -1.07 1.02 1.03 1.39 

SL 0.12 0.89 0.89 0.83 

MF 10 -78.13 8.36 69.41 6.51 

10^ -77.77 8.13 68.62 6.51 

10^ -77.18 8.01 67.57 6.54 

HR 7 = 2 10 -77.27 8.71 68.42 6.62 

10^ -77.24 8.68 68.34 6.62 

10^ -77.24 8.68 68.33 6.62 

SL -1.32 19.03 19.05 18.59 

MF 10 -28.70 020 8^4 0.12 

10^ -25.96 0.18 6.92 0.11 

10^ -23.95 0.16 5.90 0.11 

HR p = 0.8 10 -33.23 0.19 11.23 0.13 

10^ -33.22 0.18 11.21 0.13 

10^ -33.21 0.18 11.21 0.13 

SL 0.88 0.11 0.12 0.11 

Table 9: Small sample of sizes n = 100 simulations (10^ replications) and resultant biases 

and mean square errors (MSE) and variances (Var), along with average theoretical variances 

scaled by n, for the HR, MF, and SL of the regression and copula parameters for the MVN 

{d = 5) copula model with strong dependence and NB2 regression. The number of jitters 
was set at m = 10, 10^, 10'^. 



were taken at baseline, every month during treatment, and every 3 months afterwards, re- 
sulting in a maximum of 7 measurements per subject. The observations are coded as 1 if 
the subject's infection was severe and otherwise. The question of interest was whether 
the percentage of severe infections decreased over time, an d whether that evolution was 
different for the two treatment groups. In accordance with iMadsen and FaneJ ( 201ll ) 



we 



use the 224 subjects observed at all seven time points, tho ugh the SL method does not 



depend on a constant "cluster" size d. In this data example, iMadsen and Fand (J201ll ) as- 
sumed an exchangeable structure for the MVN copula with logistic regressions, and hence 
this can be easily analyzed with the standard ML method. ML estimation is possible for an 
exchangeable structure, since the 7-dimensional rectangle reduces to 1-dimensional integral. 
However, for longitudinal data, it is common practice to use various parametric cor- 
relation matrices, so we have also calculated the estimates under an AR(1), and Markov 
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Dependence 




AR(1) 






Markov 


Link 


probit 


log 


;it 


probit 


logit 




Est. 


SE 


Est. 


SE 


Est. 


SE 


Est. SE 


Intercept 


-0.418 


0.116 


-0.642 


0.189 


-0.385 


0.119 


-0.587 0.194 


Treatment 


-0.006 


0.160 


0.009 


0.262 


-0.011 


0.165 


-0.006 0.268 


Time 


-0.099 


0.016 


-0.188 


0.032 


-0.111 


0.016 


-0.208 0.033 


Trt.xtime 


-0.022 


0.023 


-0.054 


0.048 


-0.021 


0.024 


-0.048 0.048 


P 


0.923 


0.015 


0.924 


0.015 


0.952 


0.010 


0.952 0.010 


I 


-412.58 


-410.06 


-405.71 


-403.49 



Table 10: Maximum SL log-likelihoods 
toenail infection data. 



's), estimates and standard errors (SE) for the the 



dependence with the SL method. Using a Markov model, the AR(1) model is generalized 
to times that are unequally-spaced, and this is the case for toenail infection data. For 
Markov dependence, R is taken as (/o'*^~*'''')i<j,A:<d- Further, both logit and probit links 
are used for the marginal binary regressions. In Table [TOt we report the resulting maxi- 
mum SL log-likelihoods (^'s), estimates and standard errors (SE) of the MVN copula-based 
models with binary regression. The SL log-likelihoods show that Markov dependence is 
marginall y better than AR(1) depen dence, and both are far better than exchangeable de- 
pendence (|Sabo and Chagantvl . l201ll . Table 1). Further, logistic regression is slightly better 
than probit regression. 

Based on our analysis and the exchangeable analysis, the standard errors show the time 
effect to be highly significant, and the treatment by time interaction insignificant; hence 
there is no (significant) difference in evolution between both treatment groups. However, 
for an AR(1) or Markov structure, the p-value for the treatment by time interaction is 
smaller than its counterpart for the exchangeable analysis. Generally speaking, this implies 
that ignoring the actual correlation structure in the data could lead to invalid conclusions, 
although this was not crucial in this example. 



7 Discussion 

In this paper we have studied two "approximate" likelihood estimation methods based on 
the CE of a discrete random variable. For the binary, Poisson, and negative binomial regres- 
sion models with the MVN copula, we have shown that these methods lead to substantial 
downward bias for the estimates of the latent correlation and the univariate marginal pa- 
rameters that they are not regression coefficients; for the latter parameters only for strong 
dependence. 

W e have shown that the simulated likelihood in iMadsenI ( 20091 ) and iMadsen and Fang 
( 201ll ) is a very inefficient way to compute a multivariate normal rectangle probability, 
since the importance weight is not bounded, and even the naive metho d is much better . 
The inefficiency of their method also yields to the claim that the GEE (JLiang and Zegerl . 
19861 ) is more efficient than the ML. A moment based estimate cannot be better than a 
maximum (simul ated) likelihood estimate, and t his has to do with the inefficiency of the 
MF method. See lSabo and Chagantyl ( 20 111 ) and ISong et al.l ( 201ll ) for further criticism. 

We have implemented a simulated likelihood method, where t he rec tangles are con- 
verted to bounded integrands via the method in iGenz and Breta ( 20021 ). and hence the 
computational and statistical efficiency of simulated likelihood is substantially improved. 
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and actually is as good as niaximum likelihood as shown for dimension 10 or lower. We 
expect our findings to hold in higher dimensions. Although there is an issue of computa- 
tional burden as the dimension and the sample size increase, this will become marginal, as 
computing technology is advancing rapidly. 

Appendix. Derivation of Q and C 

Split the n observations into the distinct sets n^^' = {i '■ yi = y''*',Xi = x'*'}. As n — )■ oo, 
then with convergence in probability. 






p(*) J' [$-i{Fy, (yf - 1; xf ) + vfy^ (yf ; xf )}] 



:= P^^ 



and 



,Vj X 



^^^{Fyj {Vik - 1; Xjfc) + Vikfvj iVik] Xjfc)} 
^ pW I' <^~\FY^{yf - l;xf) + VjfY,iyf;^f)}d 

f^ ^-^{Fy.ivf - l;xW) + v,fy,{yf-^^f)}dvk 

Note that Cj X] \ J = I, ■ ■ ■ ,d,t = 1, . . . , T are conditional expectations that have closed 
forms. For example with Z ~ N{0, 1), 

Cf = /'<i>-Hi^y,(?/f -l;xf) + ^,/y,(yf;xf)}dt;,- 

= ?7^ 77r / 2;(/)(z)<iz 

/y^,(2/f;xf)A-MF.,fer-i;xr)} 

= E[Z\<^-'{FY^{yf - l;xf )} < Z < cl.-i{FK,.(yf ;xf )}] 

and 

^f = s[z2|$-i{Fy^-(y?) - i;xW)} < z < $-Hi^y.(?/f ;xf )}]• 
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